Intorduction

This is the Data Exloration for 2016 US Presidential Election for the state of Florida. Florida is generally marked as on the “swing” states, on which the result of the presidential election depends and where the 2 largest parties -> Republicans(‘Red’) and Democrats(‘Blue’) have similar support and result can go either way.

The DataSet comes from the Federal Election Commision

I am aiming to have some insights on the following quesitons ->

  1. Which Candiate received maximum dollars.
  2. How much did each candidate receive per contributor, did the contribution size matter?
  3. From where did the candidates(both Red and Blue) got bulk of their contributions
  4. Did Contributors Occupation/Gender/Age have any corelation to which candidate they contributed to
  5. Can we draw pattern between area/job/gender (type of contributor) and the party(red or blue)
  6. When do people contribute most (US election is generally lasts 1 entire year)

Let’s Begin by adding basic packages for analysis and loading the data.

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

Dataset Analysis and cleaning

In the DataSet there are 400K+ observations and 19 variables

##     cmte_id   cand_id                 cand_nm          contbr_nm
## 1 C00580100 P80001571        Trump, Donald J.    SELLAS, KRISTEN
## 2 C00580100 P80001571        Trump, Donald J.     SELLERS, CHRIS
## 3 C00575795 P00003392 Clinton, Hillary Rodham SCHECTER, MITCHELL
## 4 C00577130 P60007168        Sanders, Bernard LERBS, CHRISTOPHER
## 5 C00575795 P00003392 Clinton, Hillary Rodham       CHOREY, LORI
## 6 C00580100 P80001571        Trump, Donald J.     PUSINS, AUDREY
##          contbr_city contbr_st contbr_zip            contbr_employer
## 1         CLEARWATER        FL      33759                    RETIRED
## 2         VERO BEACH        FL      32966 PROST BEVERAGE COMPANY LLC
## 3         PLANTATION        FL  333243808                  TERRANOVA
## 4 GREEN COVE SPRINGS        FL  320433443                       NONE
## 5        TALLAHASSEE        FL  323121800      SWEAT THERAPY FITNESS
## 6         BOCA RATON        FL      33433                       PBSO
##             contbr_occupation contb_receipt_amt contb_receipt_dt
## 1                     RETIRED             68.37        09-NOV-16
## 2 BEVERAGE INDUSTRY EXECUTIVE             80.00        19-NOV-16
## 3                  CONTROLLER             15.00        22-APR-16
## 4                NOT EMPLOYED             50.00        05-MAR-16
## 5             FITNESS TRAINER            100.00        06-APR-16
## 6             LAW ENFIRCEMENY             76.89        02-DEC-16
##   receipt_desc memo_cd                           memo_text form_tp
## 1                    X                                        SA18
## 2                    X                                        SA18
## 3                    X              * HILLARY VICTORY FUND    SA18
## 4                      * EARMARKED CONTRIBUTION: SEE BELOW   SA17A
## 5                    X              * HILLARY VICTORY FUND    SA18
## 6                    X                                        SA18
##   file_num     tran_id election_tp
## 1  1146165 SA18.145176       G2016
## 2  1146165 SA18.120235       G2016
## 3  1091718    C4746745       P2016
## 4  1077404 VPF7BKX6F26       P2016
## 5  1091718    C4682258       P2016
## 6  1146165 SA18.133502       G2016

Data is mostly catagorical with most of the features being text and number. There are certain varaibles like file_num, tran_id, memo_text which might not be usefull for our analysis, will consider removing them from data frame if needed.

Exploring Columns Unique Values for election type

## # A tibble: 5 x 2
##   election_tp  `n()`
##        <fctr>  <int>
## 1               1244
## 2       G2016 153073
## 3       O2016     54
## 4       P2016 271685
## 5       P2020      1

We can see that moslt of the contributions recorded are for Primaries of 2016 eleciton with around 150k of them for general election. We’ll filter the data based on this to actually see different data points for both the primaries and general election if needed. Checking out the number of cancidates that received contributions in the entire election, we’ll remove those candidates that received less than 1000 contirbutions.

## # A tibble: 10 x 2
##                      cand_nm  `n()`
##                       <fctr>  <int>
##  1                 Bush, Jeb   6045
##  2       Carson, Benjamin S.  16074
##  3   Clinton, Hillary Rodham 184378
##  4 Cruz, Rafael Edward 'Ted'  29153
##  5            Fiorina, Carly   2056
##  6           Kasich, John R.   1342
##  7                Paul, Rand   2028
##  8              Rubio, Marco  20472
##  9          Sanders, Bernard  82523
## 10          Trump, Donald J.  78970

There is no column ‘Party’ in the data set to identify to which party did the candidate belong to.

Adding a new column ‘Party’ to dataset.

## # A tibble: 6 x 18
## # Groups:   cand_nm [3]
##     cmte_id   cand_id                 cand_nm          contbr_nm
##       <chr>    <fctr>                  <fctr>             <fctr>
## 1 C00580100 P80001571        Trump, Donald J.    SELLAS, KRISTEN
## 2 C00580100 P80001571        Trump, Donald J.     SELLERS, CHRIS
## 3 C00575795 P00003392 Clinton, Hillary Rodham SCHECTER, MITCHELL
## 4 C00577130 P60007168        Sanders, Bernard LERBS, CHRISTOPHER
## 5 C00575795 P00003392 Clinton, Hillary Rodham       CHOREY, LORI
## 6 C00580100 P80001571        Trump, Donald J.     PUSINS, AUDREY
## # ... with 14 more variables: contbr_city <fctr>, contbr_st <fctr>,
## #   contbr_zip <dbl>, contbr_employer <fctr>, contbr_occupation <fctr>,
## #   contb_receipt_amt <dbl>, contb_receipt_dt <fctr>, receipt_desc <fctr>,
## #   memo_cd <fctr>, memo_text <fctr>, form_tp <fctr>, file_num <int>,
## #   tran_id <fctr>, election_tp <fctr>

Zip codes are huge numbers, extracting first 5 digits for standardization

## # A tibble: 6 x 19
## # Groups:   cand_nm [3]
##     cmte_id   cand_id                 cand_nm          contbr_nm
##       <chr>    <fctr>                  <fctr>             <fctr>
## 1 C00580100 P80001571        Trump, Donald J.    SELLAS, KRISTEN
## 2 C00580100 P80001571        Trump, Donald J.     SELLERS, CHRIS
## 3 C00575795 P00003392 Clinton, Hillary Rodham SCHECTER, MITCHELL
## 4 C00577130 P60007168        Sanders, Bernard LERBS, CHRISTOPHER
## 5 C00575795 P00003392 Clinton, Hillary Rodham       CHOREY, LORI
## 6 C00580100 P80001571        Trump, Donald J.     PUSINS, AUDREY
## # ... with 15 more variables: contbr_city <fctr>, contbr_st <fctr>,
## #   contbr_zip <chr>, contbr_employer <fctr>, contbr_occupation <fctr>,
## #   contb_receipt_amt <dbl>, contb_receipt_dt <fctr>, receipt_desc <fctr>,
## #   memo_cd <fctr>, memo_text <fctr>, form_tp <fctr>, file_num <int>,
## #   tran_id <fctr>, election_tp <fctr>, party <chr>

Univariate Analysis

Analysing some data points related to date of contribution, party to which contributions is done, candidate wise contribution.

Let’s start with date of contribution

Starting with Date of Contribution or to get more meaningfull numbers, we’ll see how many days before day of election (8th Nov 2016)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -53.0    69.0   147.0   174.4   254.0  1134.0

We can see that Most people donated ~5Months before the election, just after the primaries, when the campaigning was in full swing. We are getting the minimum value as -53, it’s most probably an outlier, someone donated after the election was over.

Let’s see the trends for date of contrubtion

## Warning: Removed 13232 rows containing non-finite values (stat_bin).

From the plot we can see that while there was some hike in contributions around the primaries (244-300 days before election).Many people donated just around 100 day martk when the election campaign was going on, there’s a spike in the contributions just before election day, this can be attributed to final surge of campaigning being done by both the candidates.

The trend can also been seen, as just before the primaries, there was a surge of contribution and just after that the contributions dropped below.

Let’s now see how the contributions were candidate wise and party wise

First checking the total contributions for both the major parties

## Warning: Ignoring unknown parameters: binwidth, bins, pad

As we can see despite the fact that republican candidates were much more compared to Democrat candidated the number of contributions for Democrats are much higher. We’ll asses the value of these contributions in the next section

Checking the Contributions per candidate

Surprisingly hillary Clinton received the most number of contributions in the state of Florida. From the republican side Jeb Bush and Marco Rubio were Local candidates and even they received less contribution.

This might be as Jeb Bush dropped out of the race before the Florida Primary and Marko Rubio did not emerge as Republican candidate at the end of primaries. ### Contribution percentage per candidate

Let’s see how the contribution for each candidate was as a portion of the all the number of contributions. Also let’s analyse how it was different for both Primaries and General Election.

Starting with Primaries (Filtering out candiates who had really less contributions)

We can see that Hillary Clinton and Bernie Sanders had most of the contributors, with Trump from the Republican side having maximum contribution

Let’s see how it changed for the general election, where only 2 candiates were there

It seems Hillary Clinton had overwhilmigly more contributors that Donald Trump in the General Election.

It’s interesting to note that even for General Election people contributed to other candidates which had dropped out of the race.

Another interesting point is that even though Hillary Clinton had more number of contributors, she ended up losing in the Florida Elections

In which election was more contributions General/Primaries

Lets see in which of the election Primarires or General there were more number of contributions

## Warning: Ignoring unknown parameters: binwidth, bins, pad

We can clearly see that contribution for primaries were much more compared to General Election

Bivariate Analysis

Let’s do further analysis on election data set.

Party wise contribution amount

Let’s see the contribution by seprating the democrats and republicans.

## Warning: Ignoring unknown parameters: binwidth, bins, pad

Looks like the number of contributions for republicans varied by a large amount in general vs primaries, the count of contribution significantly decreased in the general election while it was much more consistent for democrats.

It’s worth noting that in the general election, republican candidate Donald Trump emerged as the victorous.

Amount of contributions.

Let’s see how much did each candidate received.

We can clearly see that Donald Trump from Republican side and hillary Clinton from Democrat side received the highest contributions.

Now let’s see for these candidates - how much was the average contribution.

## # A tibble: 6 x 3
## # Groups:   cand_nm [6]
##                     cand_nm      party   avg_fund
##                      <fctr>      <chr>      <dbl>
## 1                 Bush, Jeb Republican 1093.95272
## 2       Carson, Benjamin S. Republican  113.98395
## 3   Clinton, Hillary Rodham   Democrat  118.87217
## 4 Cruz, Rafael Edward 'Ted' Republican   97.68068
## 5            Fiorina, Carly Republican  208.08759
## 6           Kasich, John R. Republican  566.98261
## Warning: Ignoring unknown parameters: binwidth, bins, pad

We can clearly see that Jeb Bush had much higher contribution per person compared to others, this is primaroty due to the fact that few of the contributions for Jeb Bush were extraodinarly high. while Hillary clinton had maximum amount of contributions as we saw earloer, we can see that the average contribution was much less

It is looking more and more like it is the number of contributions that count rather than the amount of each contribution.

Contribution amount by party

How did the contribution amount varied with party - having binwidth as 10 and limiting the amount to <1000 as that were most of the contributions

## Warning: Removed 10428 rows containing non-finite values (stat_bin).

Having binwidth as 5 and limiting the amount to <300 to see some variations party wise

## Warning: Ignoring unknown parameters: alplha
## Warning: Removed 30871 rows containing non-finite values (stat_bin).

We can see that the contribution amount was generally higher for republicans than it was for democrats

Represendintg the same in box plot, to see if we can gather some more insights

The Median amount for republicans is higher than that of democrats.

Let’s see if this holds tru if only consider the general election.

Even in General Electioncontribution for Republican was higher than that of democrat.

In fact when comapring the above 2 plots, we can see that republican increased for the general election, while democratic contribution remained mostly same. While the total contribution for Hillary were still more than that of Donald Trump, If we are considering only general election, donation for republican is more than that of democrats. This is telling as final result for the state of Florida was Donald Trump winning in the General election. Florida being a swing state, played an important role in the election outcome. ### How was the contribution distributed for candidates when measuring with days from election

## Warning: Ignoring unknown aesthetics: fill
## Warning: Removed 3807 rows containing non-finite values (stat_bin).

We can see that arount 300 days from election, around the priamries, each candidate had jump in contributions.

Similarly towards the end, Donald Trump and Hillary Clinton had a jump in contributions, with there being a particualrly large spike for clinton.

Multivariate Analysis

Contributions per Occupation

Let’s see if we can draw any conclusion from what were the occupations of the highest contributors

We can see that for both parties, Retires had the maximum contribution, particularly for Republicans a lot of the Contributions from Retired Individuals

We can even see how the range of contributors varied for Republicans and Democrats.

While for Republicans majority of donation came from Retires, homemakes, contractors, business. For Democrats apart form Retires, Lot of Not Employed people donated ### Cumilitive Contribution by candidate with Date

Lets analyse the contribution distribution for different candiates and how it varied with time.

We can see that the rise and fall in the contiruvtions to both party candidates was consistent in how it rise and fell across the time period of last 100 days of election.

Few interesting points to note here is that Initially contributions for Donald Trump were huge compated to Hillary Clinton, but after that it was fairly consistent, with Hillary Finishing strong in the end.

Comparison of contributions Democrat/Republican and how it varied over time

Comparison of average contribution to Democrat and Republican month on month basis

This is a great reperesentation showing how the average contribution varied with time for Democrats and Republicans. We can see that there was a huge dfference between donations for republicans and democreats, with that of republicans being much more. Although the difference was really less towards the end of the race. There never was point in which contributions to democrats soured above that of reoublicans.

The one negetive point in the representation can be considered an outlier, as it is most likely related to return/rearrage of funds after several candidates dropped out of the race near the primaries.

Analysing the average cobtribution amount for candidates

Here we can see that variation of how the average contribution for each cnadidate changed over the course of 1 year of election campaign.

Analysing from which city the maximum contributions came from.

Now We’ll see how much contribution came from different cities First lets plot where the most amount of contribution came from with total funds from that city > 50000

Lots of varaiation from city to city

## # A tibble: 1 x 3
## # Groups:   contbr_city [1]
##   contbr_city    party cand_funds
##        <fctr>    <chr>      <dbl>
## 1       MIAMI Democrat    2278152

We can see the top contributor is Miami with total funds being 2,238,152

Top 1% of contributions from which cities?

## Warning: Ignoring unknown parameters: binwidth, bins, pad, alplha

When analysing top 1% contributor, we see much less variation.

Visualizing from which area the contributions came from for candidates

For visualizing th maps we’ll leverage the maps package For using maps package we need to convert the zipcodes to lat/long, using ‘zipcode’ package

We can see that the contribution are fairly well divided and coming from all over the State.

Now lets see thi candidate wise, From where did which candidate got maximum contribution

Looks like Donald Trump Got contribution from everywhere in the State.

Let’s divide this data set further only for General Election and compare democrats and republicans

Although we know that democrats got more contirbutions than republicans.

Although for party’s the contributions were more or less evenly spread out, It looks like Many of republican’s contribution came from certain strong holds, and hence it is looking like more donations came for Republicans.

Final Plots and Summary

In this section, we brush up and analyse the best looking and most informative plots we discovered above.

Plot One - Donation distribution by candidate

Description One

This Shows how the contirubtions varied for each candidate. Most contributions went to few candidates with Hillary Clinton, Donald Trump and Bernie Sanders being the top Candidates for receiving most contributions.

Plot Two - Mapping Dem/Rep by Area

Description Two

When plotting the entire ampaign contribution data for Demorats and republicans, We can see that while contribuion for democrats were evenly distributed. For Republicans the the bulk of the contribution came from specific areas, especially to the west of Florida.

Plot Three - Contribution to candidate over time.

Decsription Three

We earlier saw how the Contribution amount varied for the candidates over the last 100 days. This plot indicates how the Average contributions to candidates varied over the 1 year before the election, and how it increased in particalur for Donald Trump in the end.

Reflection

Issues

I encountered few issues while doing this analysis, primarily inadequate data Particaularly gender and income data

I beleive if we had contributor age/gender data available as well, it would have genereated interesting data points and lot could have been analysed based age/gender and the candidate to which major age gorup / gender reported.

It would have been also interesting to see the contributions with respect to mean income for that city and draw analysis on that

Conclusion and Summary

By looking at the donation data we can catch a glimpse how the candidates poled during the primaries and general election.

It is interesting to note that democrats got more number of contribution in the state of florida with Hillary Clinton getting maximum amount of contribution. And even for General election there were much less contributions for Republicans. Still Donald Trump, the republican candidate won the state in the election.

I was able to answer most the quesiton posed by me before the analysis.

Future Analysis

We can do even further analysis in particular the spending analysis leading up to the general election from each candidate (Hillary Clinton and Donald Trump) and see if data combination of campaign contribution and spending had any corelation to the ultimate result i.e. Donald Trump winning the state of Florida.

I would love to club this data with Florida Election Watch and then draw even more analysis on the cobined data set.

References

1.http://www.datacarpentry.org/dc_zurich/R-ecology/05-visualisation-ggplot2.html

  1. R For Data Science - http://r4ds.had.co.nz/explore-intro.html

3.https://uchicagoconsulting.wordpress.com/2011/04/18/how-to-draw-good-looking-maps-in-r/

  1. https://www.trulia.com/blog/tech/the-choroplethr-package-for-r/

  2. https://stackoverflow.com/questions/41338757/adding-percentage-labels-on-pie-chart-in-r

  3. https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/polygon.html

  4. https://briatte.github.io/ggcorr/#controlling-the-coefficient-labels

  5. http://varianceexplained.org/r/improving-pie-chart/